Compare Me Maybe: Crowd Entity Resolution Interfaces

نویسندگان

  • Steven Euijong Whang
  • Julian McAuley
  • Hector Garcia-Molina
چکیده

We study the problem of enhancing entity resolution (ER) with the help of crowdsourcing. ER is the problem of identifying records that refer to the same real-world entity and can be an extremely difficult process for computer algorithms alone. For example, figuring out which images refer to the same person can be a hard task for computers, but an easy one for humans. An important component of crowdsourcing is the interface that is used for human and algorithm interaction. In this paper, we explore how the interface design along with other factors impact the human quality of comparing records. We also propose a model for separating good human workers from bad workers. Our analysis is based on extensive experiments on Amazon Mechanical Turk using real and synthetic image datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fun and Engaging Interface for Crowdsourcing Named Entities

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is t...

متن کامل

Broad Twitter Corpus: A Diverse Named Entity Recognition Resource

One of the main obstacles, hampering method development and comparative evaluation of named entity recognition in social media, is the lack of a sizeable, diverse, high quality annotated corpus, analogous to the CoNLL’2003 news dataset. For instance, the biggest Ritter tweet corpus is only 45 000 tokens – a mere 15% the size of CoNLL’2003. Another major shortcoming is the lack of temporal, geog...

متن کامل

Fault-Tolerant Entity Resolution with the Crowd

In recent years, crowdsourcing is increasingly applied as a means to enhance data quality. Although the crowd generates insightful information especially for complex problems such as entity resolution (ER), the output quality of crowd workers is often noisy. That is, workers may unintentionally generate false or contradicting data even for simple tasks. The challenge that we address in this pap...

متن کامل

A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution

Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Due to inherent ambiguity of data representation and poor data quality, ER is a challenging task for any automated process. As a remedy, human-powered ER via crowdsourcing has become popular in recent years. Using crowd to answer queri...

متن کامل

Attribute-based Crowd Entity Resolution: Technical Report

We study the problem of using the crowd to perform entity resolution (ER) on a set of records. For many types of records, especially those involving images, such a task can be difficult for machines, but relatively easy for humans. Typical crowd-based ER approaches ask workers for pairwise judgments between records, which quickly becomes prohibitively expensive even for moderate numbers of reco...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012